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TESTS  FOR  UNIFORMITY  ARISING  FROM  A 
SERIES  OF  EVENTS 


1.  INTRODUCTION. 

A  number  of  problems  in  statistics  reduce  to  testing  whether  a  set 
of  values  comes  from  a  uniform  distribution:  when  the  limits  of  this 
distribution  are  A,B  we  write  it  as  U(A,B).  Most  tests  reduce  to  a 
test  for  U(0,1).  We  concentrate  here  on  test  situations  which  arise 
from  examining  the  times  at  which  events  occur.  The  scientist  records 
these  times  on  the  time-axis:  how  should  they  then  be  analyzed? 

If  the  events  are  governed  by  a  Poisson  process  (that  is,  are  occurring 
randomly  in  time),  the  intervals  between  events  should  have  an  exponential 
distribution  and  should  be  independent.  If  not,  certain  alternatives  are 
often  of  interest.  We  therefore  pose  four  questions  which  arise: 

Question  1.  Are  the  time  intervals  exponential,  or 

Question  2.  Are  the  time  intervals  too  regular  or  variable  to  be 
exponential? 

Question  3.  Are  the  intervals  independent? 

Question  4.  Is  a  cluster  of  events  (or  more  than  one)  present? 


Formally,  these  problems  can  be  set  up  in  relation  to  the  times  of 

events,  as  follows.  Let  0  <  T.,.  <  T...  <  •••  <  T,  .  be  the  times  of 

(.1)  (2;  (n)  - 

observed  events,  occurring  with  continuous  observation  in  the  interval 


(0,T) ;  let  U(i)  =  T^.^/T,  i=l,...,n,  and  define  U  =  0  and  U  ^  -  1 . 

Let  the  intervals  between  events  be  E,  =  T...-T  .  ...  for  i=l . n-*-l. 

-  l  (l)  U-l) 

with  T...  =  0  and  T.  , =  T;  these  lead  to  spacincs  D.  =  V  .  ,  , 

(0)  (n+1)  — 1 - i  u)  (i—l 1 

i=l,...,n+l,  between  the  values  U,^,  and  Di  =  E^/T. 


1 


come  from  the 


Question  1  poses  the  question  whether  or  not  the 
exponential  distribution  F(x)  =  l-exp(-x/S),  x  >  0;  when  they  do  and 
when  they  are  independent,  the  transformation  above  gives  a  set  U  ^ 
which  are  ordered  uniforms  from  U(0,1).  Thus  the  test  for  the  Poisson 
process  becomes  a  test  of  uniformity,  specifically,  a  test  of 

H^:  the  l1^  are  ordered  uniforms  from  l'(0,l). 

The  transformation  from  T,.,  to  U , . ,  effectively  eliminates  the 

(i)  (i)  7 

unknown  parameter  p,  which  is  of  course  connected  with  the  unknown  Poisson 
process  rate  \ . 

When  the  times  T(i)  are  given,  it  is  very  natural  to  examine  the  four 
questions  above  by  transforming  to  the  and  then  making  a  test  of  HQ. 

In  this  article  we  discuss,  in  this  context,  many  of  the  possible  tests  avail¬ 
able,  and  note  some  features  which  should  be  emphasized  and  on  which  further 
work  is  needed.  For  example,  for  some  realistic  alternatives  some  test  statis 
tics  might  have  to  be  used  with  the  lower  tail  whereas  use  of  the  upper  tail  i 
customary,  thus  changing  accepted  power  comparisons,  or,  for  others,  existing 
results  on  power  must  be  discounted  because  they  do  not  apply  to  the  practical 
alternatives.  It  may  even  be,  of  course,  that  the  best  tests  of  questions 
1  to  4  are  not  made  at  all  by  transforming  to  the  anc*  then  testing  H^. 

This  article  forms  part  of  a  volume  in  honor  of  Professor  Herbert  Solomon 
and  among  the  test  statistics  are  some  on  which  Professor  Solomon  and  I 
have  worked  together.  These  concern  Nevman's  tests  and  tests  on  spacings. 
Perhaps  we  should  also  note  that  many  of  the  distributional  properties  of 
tests  for  uniformity  involve  elegant  geometric  probability  arguments: 


Professor  Solomon  has  maintained  a  long  interest  in  this  field  (Solomon,  1978), 

and  it  was  my  work  on  the  null  distribution  of  the  goodness-of-f it  statistic 

,2 

U  ,  using  arguments  of  geometric  probabilitv,  which  attracted  his  attention 
and  so  began  our  association  manv  years  ago. 

2.  ALTERNATIVES  TO  UNIFORMITY. 

We  first  consider  some  situations  of  interest  in  connection  with  Questions 
1  and  2. 

(a)  Testing  That  Lifetimes  Are  Exponential.  Suppose  the  times  T^. ^  are 
breakdown  times  for  a  machine  because  a  part  has  failed;  at  each  breakdown 
the  part  is  immediately  replaced,  so  that  the  intervals  are  the  lifetimes 

of  the  part. 

It  is  common  in  reliability  theory  to  test  that  such  lifetimes  are 
exponential,  and,  assuming  they  are  independent,  the  resulting  will 

be  a  realization  of  a  Poisson  process.  The  lifetime  distributions,  alterna¬ 
tive  to  the  exponential,  which  one  might  wish  to  detect,  are  then  often  the 
Weibull  distribution 


Fw(x)  =  1-exp- (*)  ,  x  >  0 

X 

and  the  Gamma  distribution  F^fx)  =  -rg  f(t)dt,  where 

1  .-1  -t'f 

f  (t)  =  - —  t  '  e  ,  t  >  0. 

(v) 

the  resulting 
will  be  important 


If  the  E.  are  from  Fr(x)  or  F,,(x),  we  describe 


=  E^/"l  as  scaled  Gamma  or  scaled  Weibull  spacings;  it 


3 


should  be  able  to  detect  scaled 


that  a  test  for  uniformitv  of  the  U 


(i) 


Gamma  or  scaled  Weibull  spacings.  When  y  s  1.  in  both  Weibull  and  Gamma 
distributions,  the  lifetimes  will  be  less  spread  out,  for  a  given  mean, 
than  if  they  were  exponential  (the  coefficient  of  variation  is  ■  1,  whereas 

for  exponential  it  is  1);  we  can  call  these  lifetimes  super-regular .  Super¬ 


regular  lifetimes  will  produce  a  set  l',^  which  are  excessively  evenlv 


spaced,  more  so  than  expected  for  a  uniform  sample.  Stephens  (1986b)  calls 
such  U,^  superuniform,  and  tests  of  Hq,  in  this  context,  should  be  able 
to  detect  superuniform  U^.  Superregular  lifetimes  might  be  expected  to 
occur,  for  example,  if  there  is  a  high  level  of  quality  control  tor  the 
machine  component.  If  y  <  1,  the  lifetimes  are  super-variable ,  and  lead 
to  supervariable  spacings  between  the  U 


(i) " 


In  reliability  theory  an  important  feature  of  a  distribution  is  the 
failure  rate,  or  hazard  rate,  given  by  b(x)  =  f (x) /{ 1-F (x) } .  A  distribu¬ 
tion  with  decreasing  failure  rate  (a  DFR  distribution)  is  such  that  b(x) 
decreases  as  x  increases,  and  for  an  increasing  failure  r3te  (IFR)  distri¬ 
bution,  b(x)  increases  with  x.  An  exponential  distribution  has  constant 
failure  rate  1/3  for  all  values  of  x;  Gamma  and  Weibull  distributions 


F^(x)  and  F^.(x)  have  DFR  if  .  •'  1,  and  IFR  if  t  •  1.  If  lifetimes 


come  from,  say,  a  DFR  Weibull  distribution,  the  spacings  between  smaller 


•  •  *■  t'  (. 
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a  mental  picture  of  the  relative  sizes  of  exponential  spacings. 
However,  the  following  transformation  can  be  used.  Let  m  =  n+1 ; 


we 
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$ 
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$ 

ss 


ui 


have  m  lifetimes  E.,  including  the  last  (unfinished)  lifetime  T-T  . 

i  n 


Let  E'  =  (m+l-i) (E , . .-E , .  . .  ,  where  E,..  are  tne  ordered  E  . 
1  (l)  (l-l)  (l)  i 


(i=l,...,m;  E, s  0).  If  the  original  E.  were  exponentials,  the  E' 
iU)  i  ■  i 


will  be  unordered  independent  exponentials  with  the  same  scale  parameter 


but  if  the  E_^  were  DFR,  the  will,  on  average,  be  increasin'.’,  with 


i.  Correspondingly  reversed  results  hold  for  lifetimes  from  an  increasing 


failure  rate  (IFR)  distribution.  Note  that  when  the  E  are  independent 


exponential,  giving  rise  to  independent  exponential  El,  these  intervals 


can  be  used  to  construct  "times"  =  E^,  =  E^+E^,  1  (3)  =  E j^E^Ly... 


etc.,  and,  by  scaling  to  give  U'(i)  =  T(i)/T(n+1)*  the  U'(i)»  i  =  1 


*  •  •  •  » n , 


should  be  (ordered)  U(0,1).  Tests  based  on  the  L’|.  ^  are  often  used  to  test  for 


exponentiality  of  the  original  E^;  however,  we  have  now  moved  away  from 
the  naturally  occurring  time  sequence  and  we  shall  not  consider 


these  tests  at  present:  see  Section  4. 


(b)  Testing  That  Lifetimes  are  Exponential,  but  with  Average  Lifetime 


Changing  with  Time.  Again  let  the  times  he  breakdown  times,  and,  for 


a  useful  illustration,  suppose  they  are  mostly  quite  far  apart.  The  life¬ 


time  of  the  replacement  component  might  remain  exponential  as  time  passes. 


but,  perhaps  for  instance  because  of  improved  manufacturing  methods,  the 


average  lifetime  (£)  increases  with  time.  On  the  whole,  then,  the  E. 


are  gradually  becoming  longer  as  time  goes  on;  the  process  generating 


the  T , . ,  can  be  viewed  as  Poisson  but  with  rate  1  no  longer  constant, 


but  taking  value  ’■  (t)  varvine  with  time.  :'ne  would  then  expect  the 


in  this  example,  to  be  closer  together  at  the  left  end  (zero''  than 


at  the  right  end  (one). 
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It  might  be  worth  observing  that  the  distinction  between  the  two 
situations,  (a)  and  (b)  above,  can  easily  be  blurred  by  conventional 
terminology.  In  (b) ,  the  machine  breaks  down  less  often  if  the  component 
has  increased  lifetime,  and  the  machine  might  then  be  colloquially  described 
as  having  a  decreasing  failure  rate  (DFR).  However,  this  is  not  the  conven¬ 
tional  use  of  this  phrase  in  the  theory  of  reliability. 

In  the  lifetime  model  (b)  just  discussed,  the  exponential  distribution 

is  not  fixed,  but  is  changing  as  time  goes  by;  B  is  increasing,  and  hence 

the  spacings  are  apparently  getting  longer.  A  graph  of  against 

i  would  be  increasing,  whereas  if  d  remained  constant  the  values  should 

hover  around  the  horizontal  line  E  =  0.  Also,  if  the  E.  came  from  a 

1 

f ixed  Weibull  or  Gamma  distribution,  the  E^,  plotted  against  i,  would 
again  be  horizontal  around  the  mean  value  of  the  distribution.  Tests 
might  be  based  on  such  graphs:  see  Stephens  (1986b)  for  further  comment. 

To  sum  up  this  discussion,  we  might  detect  Gamma  or  Weibull  alterna¬ 
tives  by  looking  for  supervariable  or  superregular  spacings  between  the 
U ( i ) ;  or,  when  the  spacings  are  superregular,  by  looking  directly  for 
superuniform  l'^.  ^  themselves.  If  the  lifetimes  are  exponential  but  with 
changing  6,  we  must  look  for  a  drift  of  the  towards  one  end  or 

the  other.  However,  other  methods  may  be  better  than  any  of  these,  as 
we  note  in  Section  A  below. 

Of  course,  intervals  can  be  superregular  or  supervariable  without 
coming  from  Gamma  or  Weibull  distributions,  and  a  distributional  test 
is  not  of  primary  interest.  The  events  might  be,  for  example,  earthquakes, 
eruptions  of  volcanoes,  or  signals  from  a  "black  hole".  If  these  are  not 
Poisson,  two  important  alternatives  are  that  they  are  occurring  at  fairly 


WWJt 
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regular  intervals,  or  perhaps  the  opposite;  that  is,  the  E  are  super¬ 
regular,  or  supervariable  (Question  2  above).  Earthquakes  may  occur  in 
clusters,  with  several  aftershocks  and  then  long  intervals  between  the 
clusters;  the  E^  would  then  appear  supervariable.  Regular  intervals 
might  be  detected  by  eye,  but  overly  dispersed  intervals  are  much  less 
easy  to  observe. 

Questions  3  and  4.  It  might  also  be  of  interest,  in  observing  the  T\  , 
to  detect  a  cluster,  that  is,  a  bunch  of  values  too  close  to  be  repre¬ 
sented  by  the  same  Poisson  process  as  the  others;  for  example,  super¬ 
imposed  on  random  signals  from  the  black  hole,  the  friends  of  E.T.  are 
sending  frequent  messages  for  him  to  call  home.  Detecting  one  or  more 
important  clusters  will  be  very  similar  to  detecting  if  the  overall 
interval  pattern  shows  too  much  dispersion. 

Finally,  there  might  be  situations  in  which  the  successive  intervals 

E  are  correlated,  for  example,  a  large  one  might  tend  to  be  followed  bv 
i 

a  small  one.  This  could,  for  example,  be  the  case  when  the  E.  represents 

i 

lengths  of  reigns  of  monarchs ,  where  a  long  reign  is  followed  by  a  short 
one  because  the  heir  is  already  older;  in  fact  this  may  explain  why  the 
dates  of  reigns  of  English  monarchs  appear  to  be  superuniform  (Pearson,  1963) 

3 .  TEST  STATISTICS  AND  THE  VARIOUS  ALTERNATIVES . 

In  this  section  we  examine  how  well  test  statistics  for  uniformity 
might  be  expected  to  detect  the  alternatives  discussed  in  Section  2.  The 

:e  convene 2  to  uniform- 
,n,  as  described  above.  iff  i- 


given  set  of  times  T^^  will  be  assumeu  t. 


(O 


DV 


(i)  =  T(i)/:’  i=1 


t*  a  ,r,  *  ,  ,*r  ,  ^  »  *  ,,  *  a  •  .  *  >  *  ,  ^  |  .  *,  "  ,  ««  ^  V"  •  _  W  l“  •“  r,“  .*  .  ‘  ,  ■  J  ,*  J-  „* 


a 


known,  we  can  take  T  =  T,  ,  ,  and  have  onlv  n-1  uniforms  L'  . .  ,  since 

(n)  ■  (i) 

U,  s  =  1.)  Test  statistics  for  the  null  hvpothesis  Hn:  the  l'..,  are 
(n)  0  (11 

ordered  uniforms  from  U(0,1),  can  be  roughly  classified  into  four  families: 

9 

(1)  the  Pearson  X“  statistic;  (2)  N'evman  smooth  tests;  (3)  EOF  tests: 

(4)  tests  based  on  spacings.  Comments  follow  on  each  of  these  families, 

and  we  shall  finally  concentrate  on  family  (4). 

2 

(1)  Pearson's  X  statistic.  This  reauires  that  the  l',.,  be  classified 

(i) 

into  groups,  preferably  groups  of  equal  probability;  thus  the  line  (0,1) 

is  divided  into  k  cells,  and  if  N.  is  the  number  of  U...  falling  into 

J  (i)  6 

2  k  9 

the  j-th  cell,  Pearson's  statistic  is  X  =  k  Z .  ,  (N.-l/k)“  with  asvmpto- 

j=l  J  ■  ^ 

2 

tically  a  ^  distribution.  The  grouping  into  cells  loses  much  of  the 

2 

information  in  the  U^,  especially  for  a  small  sample,  and  the  X  -statistic 
has  low  power  against  most  alternatives.  (Stephens,  1974;  Quesenberry  and 


Miller,  1977).  Also,  as  usually  used,  large  values  of  X"  lead  to  rejec- 

2 

tion  of  H^:  X  will  not  then  detect  super-regularity  of  intervals  unless 
small  values  are  declared  significant  also. 


(2)  Neyman  Smooth  Tests.  Neyman  suggested  that  an  alternative  densitv  to 
uniformity  for  U  could  be  written 


f  (x)  =  c  exp  { 1  +  V  e.n.(x)},  0  <  x  <■  1,  k  =  1 , 2  . 

3  =  1  3  3 


where  ^ (x) , 7^(x) , . . .  are  Legendre  polynomials,  -  ,  n,...  are  parameters, 
and  c,  a  function  of  the  "  ,  is  a  normalizing  constant.  The  Legendre 
polynomials  are  orthogonal  on  (0,1)  and,  bv  varving  k.  f(x)  mao  re 
made  to  approximate  an-  riven  alter::  it  ivc  as  oloselv  as  desired.  *  r.i  •'  >rr:i  t% 
requires  that  all  .  =  0;  thus  the  test  H  can  he  put  is  a  test  that 


.AvV. 


i  s  a 


try  : 


»W  &'*■&,  k^a  ftW  4U  •7&-|h Jf 


.'tlatla'  lt«  it.  U,  jt.  ^>1,  lit  H,  jU  *«■  _  — — __ 


9?  =  0.  By  likelihood  ratio  methods,  Neyman  formed  the  test  statistic 


N’k  =  ‘-j-i  Vj»  where  is  a  N’evman  component  dependent  on  Zn_^  l.(x.i 


The  interesting  point  of  this  method,  in  the  present  context,  is  that  the 


first  two  components  are  functions  of  U  and  of  the  variance  s“(d)  = 


E(l\-0.5)  /n.  A  significant  value  for  1’  might  well  occur  in  connection 


with  Question  1,  if  the  lifetimes  were  exponential,  but  5  was  changing  in 


time;  and  a  significant  value  for  s  (U)  might  arise  if  there  is  a  cluster 


of  events,  or  possibly  negatively  auto-correlated  intervals,  both  tending  to 


give  a  small  variance  of  the  U  values.  Percentage  points  for  these 


statistics  are  given  by  Stephens  (1966).  (Note  that  it  is  the  variance 


of  the 


which  would  be  required  to  examine  Question  2  —  see 


Greenwood's  statistic  below.)  The  individual  components  v^  are  normalized 


to  be  N(0,1)  asymptotically;  they  are  then  also  independent,  so  that  for 


,2  .  2 


large  n,  x  Some  studies  have  indicated  that  is  a  good  test 


statistic  for  general  alternatives;  the  addition  of  further  components  can 


often  weaken  the  overall  power  of  N’T.  However,  N“  approaches  its 

K  K 


asymptotic  distribution  only  slowly:  Solomon  and  Stephens  (1983)  have 


recently  given  percentage  points  of  N9  based  on  fitting  Pearson  curves. 


Curve-fitting  using  moments,  to  obtain  percentage  points,  has  been  one  of 


our  research  interests  in  recent  years  (Solomon  and  Stephens,  1978),  since 


computers  have  made  such  techniques  much  more  practicable.  Further  comments 


on  Neyman's  statistics  are  in  Solomon  and  Stephens  (1983). 


(3)  EPF  Tests.  Tests  based  on  the  empirical  distribution  function  are 


becoming  increasinglv  well-known.  "he  most  famous  'f  these  is  P ,  tiu 


Kolme.yr.v.'  statistic;  ::  i.- 


'  ‘ '  l  li) 


-  ( i  -  1 )  '  r.  i  an^ 


•  .*  v  -7.  :•  -  -  i\.  ■-*  v  Vv 


D  =  max^  (i/n-l'  ^  ^ )  ;  D  =  max(D  ,D  ).  Other  statistics  are  V  =  D  +D  ; 

N2  =  7 -U  -(2i-l)/Cn)  -2  +  l/(12n)  ;  L'2  =  W2-n(U-l/2)  2,  where 

U  =  IU  - . . /n;  and 

(i) 

-  D 

A*"  =  -[  •  ( 2 i  —  1  )  [  log  If  log  ( l-lf  ..)]'•  /n  -  n  . 

.  -- .  (  1  )  (n-*-l-i ) 

i=  1 

As  usually  used,  all  these  statistics  are  significant  with  large  values 
,  2 

onlv.  (Note  that  U  is  a  statistic  calculated  from  the  U  .  ,  of  the 

- U) 

"> 

sample;  this  terminology  makes  an  unfortunate  double  use  of  If,  but  r“ 
is  the  usual  name  for  this  statistic  so  we  retain  it.) 

Statistic  D+  can  be  expected  to  detect  a  drift  of  l'  toward  0, 

D  a  drift  toward  1;  however,  if  the  direction  of  drift  is  not  known, 

2 

D  must  be  used,  and  then  it  will  often  be  less  powerful  than  W  and, 

2 

more  particularly,  A  .  As  usually  used,  the  statistics  will  not  detect 
excessive  regularity  in  the  intervals-  this  would  produce  small  values  of 
the  statistics,  so  lower  tail  tests  must  be  considered.  Autocorrelation 
will  tend  also  to  produce  low  values  of  these  statistics.  A  c luster 

9 

might  be  detected  by  l'“  or  V,  with  large  values  significant,  since 
these  detect  change  in  variance  of  the  If-set.  Further  comparisons  of 
tests  for  these  situations  need  to  he  made. 

(4)  Spac  i  ngs  ■  The  final  group  of  tests  to  be  considered  here  is  the  group  based 
on  the  soacinas  I),  s  F../T.  Note  that  because  the  E.  are  divided  bv  their 

- - li  i 

total,  the  I’O  are  not  independent  0=1),  nor  are  thev  distributed 

exponentially  on  Hq :  the  marginal  distribution  of  anv  one  spacing  is 
F ( x )  =  l-(l-x)".  Ox-  1.  i-reenwood  (19-6)  was  one  of  the  first  to 

propose  ,  i  t  o  >  l  b  t  u  i  *  s  i.  i .  !  v  *  v  t.  u  .a  t  i  n  ei  i  c  *  r  *  i .  .1  -  t.  1 . »..  -  *  .  *.  . .  1  ■  v*. 


2 

process)  based  on  the  D.:  Greenwood's  statistic  is  G  =  ID..  This 
*  i  1 

statistic  was  investigated  by  Moran  (1947,1953,1981);  interest  has  revived 

again  in  recent  years,  and  percentage  points  for  finite  n  have  been 

given  by  Burrows  (1979),  Currie  (1981)  and  Stephens  (1981).  There  has  also 

been  a  great  deal  of  interest  in  other  functions  of  the  (see  Pyke,  1965), 

and  also  in  test  statistics  based  on  k-spacings.  A  k-spacing,  for  !;  fixed,  is 

D.  ,  =  U  , .  , ,  v-U  , . ,  ,  i=0,l, . . . ,n+l-k;  clear lv  D.  above  is  D.  .  and  for 
l  ,k  (l+k)  (l)  'i  i,l 

this  special  case  k=l  the  second  subscript  will  be  omitted.  Also  D.  , 

i ,  k 

is  the  sum  of  adjacent  D.,  and  as  i  varies,  the  D.  ,  overlap:  non- 

X  1  )  K 

* 

overlapping  spacings  will  be  defined  by  ^  =  l- (i)  ’  i  =  0,k,2k,  etc. 
The  k-spacings  are  sometimes  called  gaps  or  stretches .  Much  interesting 
work  on  the  properties  of  k-spacings  and  statistics  based  on  these  has  been 
done  in  recent  years;  for  references  see  Stephens  (1986a).  Our  interest 
here  will  be  to  see  how  this  work  relates  to  the  questions  posed  above. 

Major  families  of  test  statistics  based  on  spacings  are: 


(a)  Greenwood's  G  and  its  extensions.  The  natural  extension  for  G 

2 

1S  \  =  ?  Di  k  f°r  k=  1*2,3,  etc,  with  the  sum  defined  over 

★  k  9 

the  possible  i-values,  (G  is  then  G, )  or  G  *  "  D  . 

1  k  i  i,k 

(b)  Lk  -  -r  loge  Di>k  or  L*  -  -  I  loge 


(c)  M  =  max.  D.  and  M^  =  min.  D.  ,  for  given  k. 

K.  X  1  j  K  R  1  1  ,  K 


Statistic  was  first  suggested  by  Moran  (1951)  in  connection  with 

testing  for  randomness  of  events.  is  the  Maximum  Likelihood  test 

statistic  for  exponent  la  I  itv  of  intervals  against  Gamma  alternatives,  so 

we  might  ._:-:pect  nig):  power  tests  of  Question  1  against  this  alternative. 
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Also , 


G  is  essentially  the  variance  of  the  spacings:  n  times  the 


variance  i: 


i  n4- 1 )  ,  sc 


might  be  useful  in  discussing  Question 


1.  Of  the  many  results  concerning  spacings  statistics  we  highlight  the 
following . 


(a)  Percentage  points  for  G  (=  G. )  have  been  referred  to  above.  Points 


for  G0  and  C^»  and  also  for  L ^ ,  for  finite  n,  have  recently  been 


given  by  McLaren  and  Stephens  (1985).  These  were  based  on  curve-fitting. 


converges  to  asymptotic  normality,  but  only  very  slowly:  L^  converges 


also  slowly,  to  a  distribution  approximated  by  a  x  distribution. 

1/2 

(b)  Suppose  the  alternative  uniformity  is  f(x)  =  l-*-t(xVn  •  Cihiscv 


(1961)  then  showed  the  asymptotic  relative  efficiency  (ARE)  of  tests  based  on 
spacings  to  be  zero  compared  to  EDF  tests.  However,  Weiss  (1965)  gives 
an  alternative  which  reverses  this  result.  Among  spacings  themselves, 


Cressie  (1979)  showed,  essentially,  that  tests  using  D.  were  asvmp- 

l ,  k 


totically  better  than  those  using  D.  for  alternatives  which  were 

i ,  K. 


-HU 


step-functions,  tending  to  the  uniform  like  n  ,  slower  than  Cibisov’s, 


and  among  such  tests  was  better  than  using  ARE  as  criterion. 


However,  this  cannot  be  the  whole  story:  we  have  already  observed  that 

is  the  likelihood  ratio  test  statistic  in  testing  the  spacings  versus 
scaled  Gamma  spacing  alternatives,  and  should  have  some  optimal  properties. 
The  explanation  appears  to  lie  in  the  fact  that  the  scaled  Gamma  spacings 
alternatives  do  not  give  a  density  for  U  which  is  either  on  the  Cibisov 
or  the  Cressie  model.  Similar  results  appear  to  hold  for  Weibull  spacings. 


(c)  McLaren  and  Stephens  (1985)  have  investigated  tests  of  against 

the  alternative  that  the  D  are  scaled  Gamma  spacings  (Question  1  above 
and  the  Gamma  alternative)  and  have  found  L,  better  than  u.  .  Md  aren 


(1985)  has  calculated  the  ARE  of  to  L^  for  this  situation.  The 

ARE  A^  of  to  L^  is  only  0.39  when  k  =  1  (showing  the  expected 

excellent  performance  of  L^)  but  increases  to  A,.  =  0.74  and  A^q  =  0.85. 
From  Monte  Carlo  studies  for  finite  n,  the  L-group  clearly  dominates  the 
G-group,  and  they  both  are  better  than  EDF  statistics.  For  detecting  super¬ 
variable  spacings  (from  a  Gamma  distribution  with  y  <■  1),  all  these 
statistics  are  used  in  the  "normal"  way,  with  the  upper  tail  significant. 

For  detection  of  super  regular  intervals,  from,  for  example,  a  Gamma 
distribution  with  y  >  1,  which  might  well  occur  in  practice,  all  the 
statistics  must  be  used  with  the  lower  tail  significant.  A  possible  test 
statistic  for  super  regular  intervals  would  be  M^,  looking  to  see  if  the 
largest  k-spacing  were  too  small. 

In  the  above  study  on  L^  and  G^»  the  powers  declined  considerably 
with  increasing  k.  This  seems  reasonable;  we  know  L^  to  be  the 
Likelihood  Ratio  statistic,  and  unless  the  order  in  which  the  spacings 
appear  is  held  to  be  important,  there  is  no  reason  why  a  statistic  which 
combines  several  in  sequence  should  be  especially  effective. 


(d)  Autocorrelat ion.  Quesenberry  and  Miller  (1977)  introduced  the 
statistic 


Q  = 


n+1  ^  n 

Dr  +  7  D  . D  .  ,  . 

l  l  l+l 

1=1  1=1 


to  test  for  uniformity  taking  into  account  the  possibility  of  autocorrelation 

i  2 

between  intervals.  It  is  easv  to  see  that  G,  =  2Q-D‘-D  so  that 

-  1  n+l 

asymptotically  0  is  equivalent  to  G.,.  McLaren  and  Stephens  (1985)  report 
a  study  on  power  of  tests  for  autocorrelnted  intervals  involving  EDF 
statistics  and  also  G^,i:0,g^  and  Lj,I.n,L^.  The  alternatives  to  uniform 


•fa.'ow'l’v*. 


1 


spacings  (themselves  dependent  because  ZD^  =  1)  were  autocorrelated  scaled 

Gamma  spacings.  In  this  study,  when  correlation  was  positive,  both  the 

and  had  power  increasing  with  k;  G2’G3  an<^  L2’L3  were  better 

than  all  EDF  statistics;  L?  and  were  best  overall.  All  statistics 

were  very  poor  at  detecting  negative  autocorrelation  such  as  might  be 

expected  in  some  practical  problems  where  large  intervals  are  compensated 

by  small  ones.  Detection  of  this  effect  needs  further  study  —  in  particular, 
*  n 

the  statistic  Q  =  I  ^  w^ich  forms  part  of  Q  might  be  effective 

standing  alone. 

2 

(e)  Searching  for  a  Cluster.  The  EDF  statistics  U  and  V,  applied 
to  the  set  U,  will  tend  to  detect  a  cluster,  or  a  separation  into 
two  groups,  one  at  each  end  of  the  (0,1)  interval,  since  these  statistics 
detect  a  shift  in  variance  of  the  from  the  expected  uniform  value 

(Stephens,  1974)  .  Similarly  the  Neyman  component  will  detect  such 

a  shift  in  variance.  However,  the  presence  of  a  cluster  may  not  influence 
the  overall  variance  enough  to  register  significance  with  these  statistics, 
and  it  is  natural  to  look  at  (to  see  if  the  minimum  k-order  gap  is 

too  small)  to  detect  a  cluster.  Cressie  (1977)  has  also  examined  the  scan 
statistic  —  the  maximum  number  of  observations  ,  in  a  window  of  width 
L,  as  it  moves  along  the  (0,1)  interval.  Much  work  has  been  done,  in 
particular  by  Nauss,  Weiss,  and  more  recently  Cressie,  on  statistics  M~ 
and  N^;  see  references  in  the  papers  cited  in  this  subsection  and  in 
Stephens  (1986a).  As  one  might  expect,  there  is  a  connection  between  the 
two  statistics:  P(N  >  x)  =  P(M  ^  L)  (Nauss,  1966),  so  that  a  test  based 
on  one  is  equivalent  to  a  test  based  on  the  other.  Huntinedon  and  Nauss 
(1975)  gave  the  exact  distribution  theory  of  M^, 


but  the  formulas  are 


difficult;  Cressie  (1977)  has  given  asymptotic  theory,  that,  on  H  , 

*  tn  x)  -  exp(-x  /k.)  as  n  —  *  (recall  tnat  n  is  the  number 

of  uniforms  U^^).  Cressie  has  shown  that  the  power  of  M^,  against  a 

stepfunction  alternative  tending  to  the  uniform  like  n-1^,  is  not  as 

good  as  that  of  L  or  G  .  For  other  alternatives  to  clustering,  M, 
mm  k 

may  have  greater  power,  although  it  may  be  difficult  to  define  these.  The 
modelling  of  major  earthquakes  and  aftershocks,  where  the  aftershocks 
produce  a  cluster,  would  appear  to  be  a  possible  practical  application.  A 
difficulty  in  applying  these  statistics  is  the  choice  of  k,  or  the  window 
width  L;  for  ot-levels  to  be  correct,  this  must  not  be  decided  after  looking 
at  the  times  although  this  is  a  natural  temptation.  It  would  also  be 

valuable  if  a  sequential  test  were  available,  first  seeing  if  is  too 

small,  then  M^,  etc. 

A  test  for  super-regularity  of  spacings  might  be  based  on  (to 

see  if  the  largest  k-order  gap  is  too  small) .  Deken  (1980)  and  Solomon 
and  Stephens  (1981)  have  given  distribution  theory  and  percentage  points 
for  for  n  =  5  and  10,  and  Deken  gives  also  a  Beta  approximation 

for  larger  n.  Similar  remarks  to  those  above  apply  to  the  choice  of  k, 
and  to  the  desirability  of  a  sequential  test. 

4.  FINAL  REMARKS. 

In  this  article  we  have  tried  to  draw  attention  to  some  of  the  out¬ 
standing  questions  which  arise  when  tests  on  a  series  of  events  are  converted 
to  tests  of  uniformity.  There  are  many  ways  in  which  events  nay  depart 
from  a  random  sequence,  and  this  means  that  test  statistics  which  are 
valuable  for  detecting  one  type  of  alternative  will  not  be  valuable  for 
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another.  The  choice  of  test  statistics  in  some  situations  is  still  an  open 
question.  Among  factors  to  be  considered  are  (a)  some  statistics  may  be 
significant  in  the  tail  not  usually  used;  (b)  some  statistics  may  have 
a  parameter  which  is  difficult  to  choose  (the  order  of  Neyman's  statistic, 
for  example,  or  of  a  spacing  or  set  of  spacings);  (c)  existin:  power 
studies  may  not  be  applicable  to  the  alternatives  of  practical  concern. 

Two  other  important  issues  should  also  be  raised.  One  is  that,  with 
modern  computer  techniques,  many  statisticians  will  calculate  many 
statistics  and  look  at  them  all.  Then  the  above  factors,  couched  as  they 
are  in  the  classical  language  of  hypothesis  testing,  will  be  h.v-  important: 
formal  testing  will  not  be  applicable,  since  the  final  significance  level 
is  impossible  to  determine,  and  the  best  way  to  use  the  statistics  is 
to  allow  them,  or  their  significance  levels,  to  throw  light  on  the  data 
in  the  knowledge  of  what  different  alternatives  might  be  expected  to  give. 

Another  question  to  be  considered  is  whether  or  not  it  is  always 

useful  to  use  the  transformation  to  T  ,  simple  though  it  m.r  be.  It  is 

persuasive  that  to  discuss  autocorrelation,  or  a  change  in  exponential 

parameter,  or  clustering,  one  would  examine  the  times  and  time  intervals 

in  situ;  it  is  not  so  clear  that  to  test  that  intervals  are  exponential 

with  constant  8,  as  opposed,  say,  to  Gamma  or  Weibull,  one  should  keep 

the  intervals  as  they  occur,  and  it  may  be  best,  for  example,  to  look  at 

them  in  order  of  size.  The  construction  of  E!  and  then  l' |  ^  ,  in  Section 

2(a),  uses  the  size  order  of  the  intervals,  and  an  extensive  literature 

exists  on  tests  based  on  the  E!  or  the  U'  ;  thev  are  related  to  the 

i  ( 1 ) 

total  time  on  test  statistics  and  have  much,  merit  in  terms  of  power  (for 
some  discussion  sec  Stephens,  1  rSf-b) . 


Finally,  we  should  not  forget  the  wider  Question  which  often  lie-- 
behind  statistical  examination  of  events  —  how  to  model  the  events 
realistically .  In  all  the  applications  which  have  been  alluded  to  here 
the  incidence  of  disease,  lifetimes  in  reliability  theorv,  signals  from 
outer  space,  earthquakes,  eruntion  of  volcanoes,  anc  of  course  in  man-.- 
others,  a  good  model  will  suggest  preferential  statistical  techniaues. 
Some  interesting  comments  on  modelling,  relevant  to  spacings  statistics 
are  in  the  discussion  to  Pyke  (1965)  and  the  points  made  then  are  still 
pertinent  twenty  years  later.  This  article  merely  attempts  to  see  what 
different  statistics  might  be  expected  to  do  for  us,  and  to  suggest,  in 
addition,  where  work  still  must  be  done. 
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